Disclaimer

The original content for Neural Network and Deep Learning course is produced and developed at Algoritma and is used as the main reference for Algoritma Academy

Training Objectives

The primary objective of this course is to provide a fun and hands-on session to help participants gain full proficiency in data visualization systems and tools.

  • Data transformation using dplyr library:
    • select()
    • filter()
    • mutate()
    • summarise()
    • group_by() and ungroup()
  • Using ggplotly to adds interactivity to ggplot2 object
  • Plot publication
    • Export plot to pdf with ggpubr
    • Export interactive plot with subplot() function from plotly
    • Creat HTML dashboard with flexdashboard
  • Creating web-based dashboard using shiny library

Concept Map

Interactive Plotting in R

As data grow in complexity and size, often times the designer is tasked with the difficult task of balancing overarching storytelling with specificity in their narrative. The designer is also tasked with striking a fine balance between coverage and details under the all-too-real constraints of static graphs and plots.

Interactive visualization is a mean of overcoming these constraints, and as we’ll see later, quite a successful one at that. Quoting from the author of superheat Rebecca Barter, “Interactivity allows the viewer to engage with your data in ways impossible by static graphs. With an interactive plot, viewers can zoom into areas they care about, highlight data points that are relevant to them and hide the information that isn’t.”

We’ll start by reading our data in. The data we’ll be using is the Women in Workforce data which is a historical data about womens’ earnings and employment status, specific occupation and earnings from 2013-2016, compiled from the Bureau of Labor Statistics and the [Census Bureau](https://www.census.gov/.

Data Transformation

Data transformation is one of the crucial part in preparing our interactive charts. In the past, we’ve relied on R’s base functionality for data preparation. This time, by using dplyr, we’ll learn on new techniques that may greatly increase our productivity when working with R.

This technique is developed as “a grammar of data manipulation”, and works by providing a consistent set of “verbs” that help us solve the most common data manipulation challenges:

  • select(): For select-ing columns
## # A tibble: 2,088 x 3
##     year major_category                      percent_female
##    <dbl> <chr>                                        <dbl>
##  1  2013 Management, Business, and Financial           23.6
##  2  2013 Management, Business, and Financial           30.3
##  3  2013 Management, Business, and Financial           43.5
##  4  2013 Management, Business, and Financial           58.7
##  5  2013 Management, Business, and Financial           41.7
##  6  2013 Management, Business, and Financial           63.5
##  7  2013 Management, Business, and Financial           33.6
##  8  2013 Management, Business, and Financial           27.5
##  9  2013 Management, Business, and Financial           53.5
## 10  2013 Management, Business, and Financial           76.9
## # … with 2,078 more rows
## # A tibble: 2,088 x 3
##     year major_category                      percent_female
##    <dbl> <chr>                                        <dbl>
##  1  2013 Management, Business, and Financial           23.6
##  2  2013 Management, Business, and Financial           30.3
##  3  2013 Management, Business, and Financial           43.5
##  4  2013 Management, Business, and Financial           58.7
##  5  2013 Management, Business, and Financial           41.7
##  6  2013 Management, Business, and Financial           63.5
##  7  2013 Management, Business, and Financial           33.6
##  8  2013 Management, Business, and Financial           27.5
##  9  2013 Management, Business, and Financial           53.5
## 10  2013 Management, Business, and Financial           76.9
## # … with 2,078 more rows
  • filter(): for filter-ing row
## # A tibble: 522 x 3
##     year major_category                      percent_female
##    <dbl> <chr>                                        <dbl>
##  1  2016 Management, Business, and Financial           23.8
##  2  2016 Management, Business, and Financial           29.5
##  3  2016 Management, Business, and Financial           46.8
##  4  2016 Management, Business, and Financial           58.7
##  5  2016 Management, Business, and Financial           44.9
##  6  2016 Management, Business, and Financial           67.3
##  7  2016 Management, Business, and Financial           39.5
##  8  2016 Management, Business, and Financial           26.8
##  9  2016 Management, Business, and Financial           52.6
## 10  2016 Management, Business, and Financial           75.1
## # … with 512 more rows
  • mutate(): For manipulating column; either manipulate existing column, or create new column.
## # A tibble: 522 x 4
##     year major_category                      percent_female percent_male
##    <dbl> <chr>                                        <dbl>        <dbl>
##  1  2016 Management, Business, and Financial           23.8         76.2
##  2  2016 Management, Business, and Financial           29.5         70.5
##  3  2016 Management, Business, and Financial           46.8         53.2
##  4  2016 Management, Business, and Financial           58.7         41.3
##  5  2016 Management, Business, and Financial           44.9         55.1
##  6  2016 Management, Business, and Financial           67.3         32.7
##  7  2016 Management, Business, and Financial           39.5         60.5
##  8  2016 Management, Business, and Financial           26.8         73.2
##  9  2016 Management, Business, and Financial           52.6         47.4
## 10  2016 Management, Business, and Financial           75.1         24.9
## # … with 512 more rows
  • group_by(): For setting group to our data
  • summarise(): For taking a summary from our data
  • ungroup(): For unsetting group

Without adding group_by(), summarise() will take a summary from all existing rows in defined numerical column:

## # A tibble: 1 x 2
##   percent_female percent_male
##            <dbl>        <dbl>
## 1           36.3         63.7

By adding group_by(), summarise() will give summaries grouped by categorical column:

## # A tibble: 8 x 3
##   major_category                                     percent_male percent_female
##   <chr>                                                     <dbl>          <dbl>
## 1 Computer, Engineering, and Science                         72.3          27.7 
## 2 Education, Legal, Community Service, Arts, and Me…         44.6          55.4 
## 3 Healthcare Practitioners and Technical                     34.8          65.2 
## 4 Management, Business, and Financial                        53.3          46.7 
## 5 Natural Resources, Construction, and Maintenance           94.2           5.78
## 6 Production, Transportation, and Material Moving            77.9          22.1 
## 7 Sales and Office                                           41.3          58.7 
## 8 Service                                                    53.7          46.3

Extra Notes: Why you should use ungroup() after every group_by()

group_by() adds metadata to a data.frame that marks how rows should be grouped. As long as that metadata is there, all transformation that you do after the grouping will involved all the grouping columns.

See the following example:

  • arrange(): For arranging our rows based on a column value
## # A tibble: 8 x 3
##   major_category                                     percent_male percent_female
##   <chr>                                                     <dbl>          <dbl>
## 1 Healthcare Practitioners and Technical                     34.8          65.2 
## 2 Sales and Office                                           41.3          58.7 
## 3 Education, Legal, Community Service, Arts, and Me…         44.6          55.4 
## 4 Management, Business, and Financial                        53.3          46.7 
## 5 Service                                                    53.7          46.3 
## 6 Computer, Engineering, and Science                         72.3          27.7 
## 7 Production, Transportation, and Material Moving            77.9          22.1 
## 8 Natural Resources, Construction, and Maintenance           94.2           5.78

Other useful functions:

  • drop_na(): For dropping any NA rows in specified column(s)
##                  year            occupation        major_category 
##                     0                     0                     0 
##        minor_category         total_workers          workers_male 
##                     0                     0                     0 
##        workers_female        percent_female        total_earnings 
##                     0                     0                     0 
##   total_earnings_male total_earnings_female  wage_percent_of_male 
##                     4                    65                   846
##                  year            occupation        major_category 
##                     0                     0                     0 
##        minor_category         total_workers          workers_male 
##                     0                     0                     0 
##        workers_female        percent_female        total_earnings 
##                     0                     0                     0 
##   total_earnings_male total_earnings_female  wage_percent_of_male 
##                     0                     0                   777
  • n(): For counting number of row, either for all row or by group
## # A tibble: 1 x 1
##   n_total
##     <int>
## 1    2019
## # A tibble: 8 x 2
##   major_category                                       n_total
##   <chr>                                                  <int>
## 1 Computer, Engineering, and Science                       235
## 2 Education, Legal, Community Service, Arts, and Media     168
## 3 Healthcare Practitioners and Technical                   124
## 4 Management, Business, and Financial                      232
## 5 Natural Resources, Construction, and Maintenance         282
## 6 Production, Transportation, and Material Moving          429
## 7 Sales and Office                                         280
## 8 Service                                                  269
  • pivot_longer() / pivot_wider() for data reshaping
## # A tibble: 16 x 3
##    major_category                                       name           value
##    <chr>                                                <chr>          <dbl>
##  1 Healthcare Practitioners and Technical               percent_male   35.9 
##  2 Healthcare Practitioners and Technical               percent_female 64.1 
##  3 Sales and Office                                     percent_male   41.3 
##  4 Sales and Office                                     percent_female 58.7 
##  5 Education, Legal, Community Service, Arts, and Media percent_male   44.6 
##  6 Education, Legal, Community Service, Arts, and Media percent_female 55.4 
##  7 Service                                              percent_male   53.0 
##  8 Service                                              percent_female 47.0 
##  9 Management, Business, and Financial                  percent_male   53.3 
## 10 Management, Business, and Financial                  percent_female 46.7 
## 11 Computer, Engineering, and Science                   percent_male   72.0 
## 12 Computer, Engineering, and Science                   percent_female 28.0 
## 13 Production, Transportation, and Material Moving      percent_male   77.0 
## 14 Production, Transportation, and Material Moving      percent_female 23.0 
## 15 Natural Resources, Construction, and Maintenance     percent_male   93.5 
## 16 Natural Resources, Construction, and Maintenance     percent_female  6.50
## # A tibble: 8 x 3
##   major_category                                     percent_male percent_female
##   <chr>                                                     <dbl>          <dbl>
## 1 Healthcare Practitioners and Technical                     35.9          64.1 
## 2 Sales and Office                                           41.3          58.7 
## 3 Education, Legal, Community Service, Arts, and Me…         44.6          55.4 
## 4 Service                                                    53.0          47.0 
## 5 Management, Business, and Financial                        53.3          46.7 
## 6 Computer, Engineering, and Science                         72.0          28.0 
## 7 Production, Transportation, and Material Moving            77.0          23.0 
## 8 Natural Resources, Construction, and Maintenance           93.5           6.50

Dive Deeper

  • Est. Time: 20 mins

Read in data/youtubetrends.csv and save it as vids, then follow the following instructions:

  1. In vids dataframe, create two new columns; likesperview which stores the ratio of likes/view and dislikesperview which stores the ratio of dislikes/view:
  1. summarise() is compatible with almost all the functions in R. By using the n() function, count the total number of trending videos (1 row = 1 video) in each channel. Take only the channels that have at least 10 videos being trending and save it as vids_top:
  1. Lastly, transform vids_top as a long-format dataframe using pivot_longer():

Using ggplot + plotly

To wrap all the process we performed earlier, in the following chunk, we’ll start off by re-reading, tidying & transforming the data:

I’ve also copy-paste the earlier transformation process and save it as workers_gap dataframe. Using this data, we’ll visualize the men vs. women workers gender gap in 2016.

Dive Deeper

  • Est. 20-25 mins

Using ggplot & plotly, recreate the plot in assets/divedeep2.html file!

Plot Publication

Arrange & Export

  • Using ggarrange() and ggexport() from ggpubr library to export as simple pdf file:

Flexdashboard

Easy interactive dashboards for R:
- Use R Markdown to publish a group of related data visualizations as a dashboard.
- Support for a wide variety of components including htmlwidgets; base, lattice, and grid graphics; tabular data; gauges and value boxes; and text annotations.

More about flexdashboard: - https://rmarkdown.rstudio.com/flexdashboard/index.html
- https://rmarkdown.rstudio.com/flexdashboard/using.html#storyboards

Shiny

To keep the notebook light, the shiny section will be created under different rmd file. Please open shiny.Rmd for the next section.